Non-specific DNA-protein interaction: Why proteins can diffuse along DNA 
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The structure of DNA Binding Proteins enables a strong interaction with their specific target site on DNA. 
However, recent single molecule experiment reported that proteins can diffuse on DNA. This suggests that the 
interactions between proteins and DNA play a role during the target search even far from the specific site. It is 
unclear how these non-specific interactions optimize the search process, and how the protein structure comes 
into play. Each nucleotide being negatively charged, one may think that the positive surface of DNA-BPs 
should electrostatically collapse onto DNA. Here we show by means of Monte Carlo simulations and analytical 
calculations that a counter-intuitive repulsion between the two oppositely charged macromolecules exists at a 
nanometer range. We also show that this repulsion is due to a local increase of the osmotic pressure exerted 
by the ions which are trapped at the interface. For the concave shape of DNA-BPs, and for realistic protein 
charge densities, we find that the repulsion pushes the protein in a free energy minimum at a distance from 
DNA. As a consequence, a favorable path exists along which proteins can slide without interacting with the 
DNA bases. When a protein encounters its target, the osmotic barrier is completely counter-balanced by the 
H-bond interaction, thus enabling the sequence recognition. 



DNA stores the genetic material of all living cells and viruses, 
^i^his huge amount of information is effective only if DNA binding 
^^roteins (DNA-BPs) manipulates DNA in very specific locations. 
PQiVhen the protein finds its DNA target, the shape complementar- 
ity of DNA Binding Proteins and their specific DNA sequence 

• enables to maximize the number of hydrogen bonds, thus leading 
^o a strong protein-DNA association fH [21 E (H S El . The rate of 

Q5totein-DNA association is however not controlled by the asso- 
""dation step itself, but by the whole searching process. It is well 
^.established now that DNA-BPs diffuse along DNA before they 
^each their specific site |7|. During this search, the only inter- 
OOctions between protein and DNA which can play a role are non 
^^equence- specific. Those non-specific interactions between pro- 
^^ein and DNA remain poorly documented. Altough the predom- 
inance of electrostatics is unquestionable d El [Sj SI |5] |6l, it re- 
ains unclear how the protein structure comes into play |[5l|6l[71. 
oes the typical concavity of DNA-BPs which favors the specific 
^Association also influence the non-specific electrostatic interac- 
;;tion? In DNA-protein complexes, the mean charge of the protein 

• l^esidues located at the interface is positive Nevertheless, 
^^tructural studies of non-specific complexes have shown that the 

H)rotein atoms and the DNA atoms are weakly packed together at 
The interface lUlEIElEIISll, thus suggesting that a force counter- 
balances the electrostatic attraction. In this letter, our purpose is 
to establish the general mechanisms that control the mean force 
between protein and DNA and that are applicable to a wide va- 
riety of DNA-BPs. That goal in mind, we design coarse-grained 
DNA and protein models, rather than detailed atomic models and 
investigate their interactions. First, we prove that a short range re- 
pulsion exists when the shape of the protein is complementary to 
the shape of DNA. Second, we show that this repulsion increases 
when the protein charge decreases, and we unravel the underly- 
ing physical mechanism. Finally, we discuss in detail why this 
phenomenon is relevant to real biological systems, thanks to sta- 
tistical data of the protein charge and of the number of H-bonds 
between protein and DNA. 

The most characteristic aspect of DNA-BPs is their shape 
complementarity with DNA. As a matter of fact, the concave 



DNA-BPs can cover the convex DNA with up to 35% of their 
surface 1 1 1. At close contact, those interface regions exclude the 
solvent molecules and form numerous weak bonds with DNA 
(mainly H-bonds |T|). In a first instance, we artificially switch 
off these H-bond interactions. To probe the influence of protein 
shape in controlling the non-specific electrostatic interaction, we 
monitor changes in the potential of mean force upon modifying 
the curvature of smooth model proteins along the DNA direction 
(noted C||) and in the perpendicular direction (C±) (see Fig.|T^). 
The charge of all model proteins is given by a single -\-5e site 
placed 0.7 nm under the protein surface facing DNA. The direct 
electrostatic force in vacuum is therefore the same for any protein 
shape investigated here. The DNA is modelled as a hard cylinder 
with divalent charged sites. The water and the electrolyte ions are 
described by the primitive model of electrolyte solutions 111. This 
model has already been used to explain the less intuitive trends of 
electrostatic interactions in solution, e.g. the attraction between 
like-charged particles (g), or the repulsion between charged and 
neutral ones fTOl. The relative permittivity of water 8^ is taken 
equal to 78.25, and the radius of the salt ions is 0.15 nm. 

The potential of mean force between a protein and a DNA 
molecule separated by a distance L is equal to the free energy of 
the global system (protein, DNA and ions in water). At a fixed 
surface-to-surface distance L, this energy only depends on the ion 
distribution. We compute thus the free energy thanks to canon- 
ical Monte Carlo (MC) simulations that sample the ion configu- 
rations IIIIIEI. We voluntarily freeze the rotational degrees of 
freedom of the protein, and study the interaction for the most at- 
tractive orientation, when the protein cavity points toward DNA. 
Indeed, this orientation is the one always observed for specific 
and non-specific complexes, and we observed that the free energy 
gets abruptly more repulsive when the protein rotates. The pro- 
tein and DNA are placed in a parallelepipedical simulation box 
(275x275x150 nm) with periodic boundaries. The results are re- 
ported in Fig.[T]3. 

The curvature Cy slightly influences the range of the interac- 
tion, as illustrated by the comparison of spherical and cylindri- 
cal proteins. The effect of the curvature C± is remarkably more 
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FIG. 1: Influence of the protein shape on the interaction, a, Schematic 
view of the model proteins. The height and diameter of the cyHndrical 
proteins (2,3) are both 5 nm, as well as the side of the cubic protein (4) 
and the radius of the sphere. The hollow cylindrical proteins (3) have a 
cylindrical cavity, of curvature = 0, —0.25, —0.5 or —1 nm~^ b, Free 
energy of the DNA-protein systems computed by MC simulations. The 
protein and DNA are immersed in a monovalent salt whose Debye length 
=^ nm 1731 corresponds to physiological conditions. The standard 
deviation of the free energy is 0.2 kfiT. 



pronounced. The free energy as a function of L, which is mono- 
tonic for > 0, becomes non-monotonic for < and ex- 
hibits then a minimum Fmin at a distance Lmin- For L < Lmin, there 
is an unexpected repulsive free energy barrier between the oppo- 
sitely charged bodies, that reaches ~ 5 keT in the case of perfectly 
matching surfaces (C± = — 1//?dna)- This behavior is weakly in- 
fluenced by the shape of the remaining surface of the protein: Fmin 
varies from e.g. — 4.9 keT with a cubic protein to —5.4 keT for a 
cylindrical one. 

Once the role of the protein curvature is established, we per- 
form simulations of concave DNA-BP models with various charge 
patterns to assess the influence of the protein charge on the in- 
teraction. When the pattern changes at constant interface charge 
density Oprot, the free energy exhibits only minor variations (data 
not shown). Conversely, Gprot strongly modulates the free energy 
profile (Fig. [2]). For an interface of e.g. 15 nm^, if Oprot changes 
from 0. 1 3 1 Gdna | to 0.39 1 Gdna | , ^min dramatically decreases from 
—2 keT to —14 k^T and Lmin decreases from 0.75 nm to 0.1 nm. 

To provide a rational basis to the simulation results, we carry 
out statistical mechanical calculations within the Poisson Boltz- 
mann (PB) framework. The complementary interacting surfaces 
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FIG. 2: Influence of the protein charge on the interaction. Free energy 
of the DNA-protein system for a set of protein charge densities obtained 
by PB theory (curves) and by MC simulations (squares). The area of the 
concave protein surface ^Sint is 15 nm^. The charge density is Gprot = 
Zprot/'^'int, with Zprot the charge of the protein at the interface. The charge 
density of DNA is Gdna = —10^ nm~^. In the MC simulations, the 
shape model for the DNA-BP is a cylinder of height 5 nm, with a concave 
interface (Cj_ = — 1//?dna» ^"11 =0). The protein charges are distributed 
on a pattern of 16 sites, 0. 1 nm below the surface of the cylindrical cavity. 



of the protein and the DNA are described by a minimal model: 
two charged parallel plates separated by a distance L. In agree- 
ment with the MC results, this model predicts a minimum of 
the free energy, whose depth and position can be analytically 
expressed (141 El- Moreover, we introduce corrections to the 
plate-plate model to account for the actual curvature of protein 
and DNA by rescaling both the interface area 5'int and the charge 
density. More precisely, the PB free energy is integrated over Si^t 
after projection of each surface element on the plane orthogonal 
to the L axis |16|. If R and h are the radius and height of the 
cylindrical interface, the interaction free energy is given by 

rh/2 pR I 

F{L) = dx dyE{L)Jl -y^ /R^ = E{L)S,J2 

J-h/2 J-R ^ 

where E{L) is the interaction free energy by unit area for two 
parallel plates and z the distance between two surface elements 
of the curved bodies facing each other. The effective charge 
densities used in the PB calculation are obtained by fitting all 
the Monte Carlo results simultaneously (cj^na — 0.6 Gdna and 
Gp^Q^ :^ 1.2 Gprot). Despite the nanometer size of the interface, 
the Poisson-Boltzmann results remarkably agree with the results 
of the Monte Carlo simulations for the concave DNA-BP model 
(Fig.ig. 

Furthermore, the PB results shed light on the two physical 
mechanisms inducing an attraction and a repulsion between op- 
positely charged bodies. The cations and anions between 
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FIG. 3: Ionic density fields. The density is obtained by PB theory (a), 
and by MC simulations (b) for two protein charge densities and two dis- 
tances L. The unit is the bulk osmolarity = 0.2 mol.L"^ In the PB 
treatment, the system is translationally invariant along the plates. The 
density along the direction x perpendicular to the plates is plotted (x = 
on the protein and x = Lon DNA). In the simulations, the DNA-BPs are 
translationally invariant along the DNA axis, and the ionic density in the 
plane perpendicular to the DNA axis is plotted . 



the plates are in equilibrium with a bulk reservoir (/iVT ensem- 
ble). Here, this equilibrium displays two regimes: a counterion- 
dominated regime, for which the number of ions between the 
plates is dominated by the counterions neutralizing DNA (A^+ 
A^_), and a salt-dominated regime (N^ — N- <^ N-). It is estab- 
lished that the salt-dominated regime is attractive, because the 
salt release is favorable salt both entropically (because the vol- 
ume between the plates decreases) and electrostatically (because 
the plates are oppositely charged) ifTTl . As expected, the ionic 
density decreases as the charged plates approach each other in the 
particular case aprot = |C)dna| (i-e. = N-) representative of 
this regime (Fig. 3^). Nevertheless, if Gprot < |c>dna|, a constant 
number of neutralizing counterions remains confined between the 
plates in order to maintain electroneutrality. As L decreases, these 
cations get more and more concentrated. Below a given distance, 
this counterion trapping dominates the salt release (counterion- 
dominated regime). As a matter of fact, the ionic density increases 
as L decreases for Gprot = — 0.2aDNA, Fig.|3^. The resulting en- 
hancement of the osmotic pressure exceeds the salt-mediated at- 
traction and results in a global repulsion. 

To visualize how this mechanism applies to a more realistic 
interface, we compute the ionic density by MC simulations. As 



shown in Fig. [SJ), the two regimes are similar to those observed 
with the two-plate model. This highlights the significance of elec- 
troneutrality effects for the nanometric interfaces of biopolymers. 
Indeed, since the Debye length Xd (ie. the range of charge in- 
homogeneities in solution) is of the order of a nanometer, strong 
electric fields can appear locally and trap ions in a very confined 
space. Moreover, this physical picture explains the influence of 
shape complementarity: The interface is then large enough (rela- 
tive to Xd) and the gap thin enough to trap cations within a small 
volume. 

To what extent do real DNA-BPs trap ions between their sur- 
face and DNA? To answer this question, we perform a statistical 
analysis of the protein interface charge densities and complemen- 
tary surface areas, on a data set of 77 proteins. The charge den- 
sities of those proteins are not directly available, but DNA-BPs 
are characterized by conserved propensities of charged residues 
at the interface region, as defined in Ref. |18|. For each protein 
in the data set, we evaluate the total number of residue Nf^Jt^\ and 
the number of residue / Nf^^^ for the charged residues (/= Arg, 
Lys, Asp and Glu). Ref. 1 18] and Ref. |2| provides A^/^J, the num- 
ber of residues at the interface. We estimate the charge densi- 
ties of the proteins by approximating the propensity of a residue 
/ by {Nf /Nl'l\)/{Nf'''' /Nf^^'), and this leads to the number of 
residues / at the interface A^-"^ and thus the number of charges. We 
take a mean interface area per residue of 0.70 nm^ 1 19 1 to derive 
the mean charge density Gprot- In the case of sequence- specific 
DNA-BPs such as transcription factors and restriction enzymes, 
we obtain Gprot = (0.17 ± 0.03) |gdna|- Besides, we notice that 
the less-specific DNA-BPs (polymerases, DNA-repair proteins, 
histones) are more charged (Gprot = (0.27 ± 0.05) |gdna|)- The 
area of the fitting interface 5'prot = 15 ± 5 nm^ is similar for all 
DNA-BPs O]. According to these structural features, DNA-BPs 
should thus be repelled by DNA (cf. Fig. [2]). This repulsion ob- 
tained with a coarse-grained model is in agreement with simula- 
tions of atomic models of BamHI [20 1, showing a repulsion when 
the concave surface of the protein approaches DNA. 

To assess whether this repulsion is still significant after addi- 
tion of a realistic short-range attraction, we include H-bond in- 
teractions and study the resulting free energy as a function of 
the protein position z along the sequence and the distance L be- 
tween the surfaces. We consider a DNA-BP model of charge 
c>prot = 0-17|gdna| with a fitting shape. We account for each 
H-bond by a Morse potential term Vm{L) = D[{e~^^ - 1)^ - 1] 
with D 0.5 ksT [21J and a = 20 nm"^ |^. Crystal struc- 
tures of protein-DNA complexes provide a value of the number of 
H-bonds ^spec at the specific site (30 H-bonds for 5'int = 20 nm^ 
0). We assume that the number n of H-bonds that the protein 
can make on non-specific DNA follows a Gaussian distribution of 
average (n) = ^spec/3, and standard deviation G„ = ^/n^. The 
value of (n) is low because the number of H-bonds dramatically 
decreases for non specific sequences, even for sequences with a 
high degree of homology to the target one, as observed in the 
crystal structure of non cognate BamHI complex in Ref. |5 1. 

The resulting free energy landscape is shown in Fig. |4j Re- 
markably, the osmotic repulsion between sequence- specific DNA- 
BPs and DNA dominates along non-specific sequences. The equi- 
librium gap distance of nearly 0.5 nm is in agreement with the 
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FIG. 4: Free energy landscape. The free energy is computed along a 
30 bp DNA sequence, as a function of L and of the protein coordinates 
along DNA (z), for Gprot = O-I^IgdnaI- The gap between level lines 
is ksT . For more clarity, the additional lower graph displays the free 
energy as a function of L for each z value. In both graphs, the black curve 
corresponds to a randomly chosen non-specific coordinate, while the red 
curve corresponds to the specific-site. 



distance observed in the complexes of EcoRV (0.51 nm 1 1 1) with 
non-specific sequences. Interestingly, along the equilibrium val- 
ley, the roughness of the sequence-dependent part of the poten- 
tial is screened out: The protein can therefore easily slide along 
DNA. At the target site, the large H-bond interaction significantly 
reduces the barrier, and the protein can approach the DNA. 

Our results unravel a subtle balance between long-range elec- 
trostatic attraction, short-range osmotic repulsion and short-range 
attraction. This effect is sensitive to the shape and charge of DNA- 
BPs, and should have thus contributed to the structural evolution 
of those proteins. From a dynamical perspective, our model pro- 
vides new bases to conciliate the dual requirement of high pro- 
tein mobility and high sequence sensitivity ||23l [24l |25l . Indeed, 
the latter is usually assumed to slow down the protein diffusion 
|[26l l27l . According to our results, the DNA-BP freely diffuses 
along non-specific DNA, confined in an electrostatic free energy 
valley. The free energy barrier, which keeps the protein at a 
distance from DNA, is also a signature of the sequence: Trans- 
verse thermal fluctuations enable the protein to cross the barrier 
only at the specific site or at highly homologous sequences. This 
recognition mechanism is efficient because it does not require the 
protein to probe the molecular details of non-specific DNA se- 



quences. The implications of such a behavior on the protein ID 
diffusion along DNA recently observed both in vitro and in vivo 
||71[28l|29l[30l will be the goal of future investigations. 
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